Skip to content

Conversation

@suranjan
Copy link
Contributor

@suranjan suranjan commented Jun 3, 2017

Changes proposed in this pull request

  • In case of colocated join, use the same txState across tasks. So, if a task has started a transaction it should reused by next task.
  • Row table should also commit the tx at the end of its task.
  • Multiple task may try to commit the same tx, but at the procedure level we have either ignored the already committed tx or if they are at different nodes we commit it ignoring whether that node is co-ordinator.

Patch testing

precheckin

ReleaseNotes.txt changes

(Does this change require an entry in ReleaseNotes.txt? If yes, has it been added to it?)

Other PRs

(Does this change require changes in other projects- store, spark, spark-jobserver, aqp? Add the links of PR of the other subprojects that are related to this change)

@suranjan suranjan requested review from SachinJanani and sumwale June 3, 2017 04:07
Copy link
Contributor

@sumwale sumwale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See some comments. About point #3 have a doubt as to how that will work properly if we are not ensuring that commit happens after all tasks are done. Perhaps a separate message at the end or some accounting on executor side keeping track of partitions scheduled on that executor so that it will commit only if all the partitions are done.

SparkShellRDDHelper.snapshotTxId.set(txid)
val getSnapshotTXId = conn.prepareCall(s"call sys.GET_SNAPSHOT_TXID (?)")
getSnapshotTXId.registerOutParameter(1, java.sql.Types.VARCHAR)
getSnapshotTXId.execute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use empty parens in actions as per scala convention (and none in getters)

val txid: String = getSnapshotTXId.getString(1)
getSnapshotTXId.close()
SparkShellRDDHelper.snapshotTxId.set(txid)
logDebug(s"The snapshot tx id is ${txid} and tablename is ${tableName}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation change looks incorrect and older one with 2 spaces was proper

ps.close()
SparkShellRDDHelper.snapshotTxId.set(null)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be done after all tasks are done? Won't a commit in the middle of another task execution cause trouble?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes in general. But for read operations if the txState is already being used in an iterator , even if the state is closed, we compare with the snapshot stored.
For write operations this would cause issue if tx are committed by a task prematurely, however currently I haven't found a scenario for that in this task.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suranjan
Even then a read in a collocated join may not even be scheduled (or scheduled by Spark but not by OS) while another task commits. In that case the two will use different snapshots? I don't see what kind of semantics this change is trying to provide.

@jramnara
Copy link
Contributor

jramnara commented Jun 3, 2017 via email

@suranjan
Copy link
Contributor Author

suranjan commented Jun 3, 2017 via email

Suranjan Kumar added 2 commits June 3, 2017 22:33
@suranjan suranjan merged commit 1138a2e into master Jun 3, 2017
@sumwale
Copy link
Contributor

sumwale commented Jun 3, 2017

@suranjan If we are providing paritition-level isolation semantics only as you mention, then what is this PR trying to do using TX across tasks (which will be one per partition)? What is this trying to solve? This will result in inconsistent semantics.

@suranjan
Copy link
Contributor Author

suranjan commented Jun 3, 2017 via email

@sumwale
Copy link
Contributor

sumwale commented Jun 3, 2017

Row buffer scan + column buffer scan of a single bucket is a single task. Likewise collocated join of two buckets is a single task. A single thread executing multiple tasks in same job should take separate snapshots for reads to keep clean semantics. Even for writes, row buffer writes and column buffer writes are all same task though different code segments.

Anyway since this is already merged lets discuss what to do going forward later (unless this change causes trouble elsewhere).

@suranjan
Copy link
Contributor Author

suranjan commented Jun 3, 2017 via email

@sumwale sumwale deleted the smart-conn branch June 4, 2017 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants